AITopics | direction vector

Collaborating Authors

direction vector

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Long Chain-of-Thought Reasoning through Multi-Path Plan Aggregation

Xiong, Siheng, Payani, Ali, Fekri, Faramarz

arXiv.org Artificial IntelligenceOct-20-2025

Monte Carlo (TSMC) to provide scalable stepwise supervision using small LMs. This yields more efficient training, improved stability, and higher accuracy. OpenAI's o1 series (OpenAI, 2024) introduce inference-time scaling by increasing the length of the Chain-of-Thought (CoT) (Wei et al., 2022) reasoning process. Despite their empirical success, RL approaches that generate the entire reasoning chain in a single forward pass face notable limitations, including CoT derailment, where the reasoning trajectory drifts off course due to accumulated errors, and the inherent challenges of long-horizon RL with sparse outcome rewards. This sequential scaling strategy, i.e., simply extending the CoT length, can therefore be insufficient (Y ang et al., 2025). To improve planning quality, we introduce Multi-Path Plan Aggregation (MPP A). For each planning step, the model generates multiple alternative plans and aggregates them into an improved plan before proceeding to the subsequent execution steps. Beyond enhancing planning, we identify a fundamental challenge in credit assignment for long-horizon policy learning (Kaelbling et al., 1996). Existing RL fine-tuning frameworks struggle to provide effective process-level supervision (Guo et al., 2025). First, evaluating the correctness of intermediate steps is inherently difficult. Automated annotation using LLM judges (Gu et al., 2024) often yield unreliable or noisy signals Second, introducing a separate process reward model (PRM) adds complexity. We then define the process preference between two candidate continuations at the same step by comparing their incremental log-weights. We repurpose Twisted Sequential Monte Carlo (TSMC) to provide process-level preferences for online Step-DPO training. Results show that our approach consistently outperforms both distillation-based long-CoT methods and RL methods that rely solely on outcome rewards. The Chain-of-Thought trajectories can be lengthy and the positions of the first error vary considerably, making outcome-based RL fine-tuning inefficient. Training long trajectories with outcome rewards is highly inefficient.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.1162

Genre:

Workflow (0.87)
Research Report > New Finding (0.48)

Industry: Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.44)

Add feedback

Conversational Orientation Reasoning: Egocentric-to-Allocentric Navigation with Multimodal Chain-of-Thought

Huang, Yu Ti

arXiv.org Artificial IntelligenceSep-24-2025

Conversational agents must translate egocentric utterances (e.g., "on my right") into allocentric orientations (N/E/S/W). This challenge is particularly critical in indoor or complex facilities where GPS signals are weak and detailed maps are unavailable. While chain-of-thought (CoT) prompting has advanced reasoning in language and vision tasks, its application to multimodal spatial orientation remains underexplored. We introduce Conversational Orientation Reasoning (COR), a new benchmark designed for Traditional Chinese conversational navigation projected from real-world environments, addressing egocentric-to-allocentric reasoning in non-English and ASR-transcribed scenarios. We propose a multimodal chain-of-thought (MCoT) framework, which integrates ASR-transcribed speech with landmark coordinates through a structured three-step reasoning process: (1) extracting spatial relations, (2) mapping coordinates to absolute directions, and (3) inferring user orientation. A curriculum learning strategy progressively builds these capabilities on Taiwan-LLM-13B-v2.0-Chat, a mid-sized model representative of resource-constrained settings. Experiments show that MCoT achieves 100% orientation accuracy on clean transcripts and 98.1% with ASR transcripts, substantially outperforming unimodal and non-structured baselines. Moreover, MCoT demonstrates robustness under noisy conversational conditions, including ASR recognition errors and multilingual code-switching. The model also maintains high accuracy in cross-domain evaluation and resilience to linguistic variation, domain shift, and referential ambiguity. These findings highlight the potential of structured MCoT spatial reasoning as a path toward interpretable and resource-efficient embodied navigation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.182

Country:

North America (0.46)
Asia > Taiwan (0.37)

Genre: Research Report (0.68)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hybrid Perception and Equivariant Diffusion for Robust Multi-Node Rebar Tying

Wang, Zhitao, Xiong, Yirong, Horowitz, Roberto, Wang, Yanke, Han, Yuxing

arXiv.org Artificial IntelligenceSep-3-2025

Rebar tying is a repetitive but critical task in reinforced concrete construction, typically performed manually at considerable ergonomic risk. Recent advances in robotic manipulation hold the potential to automate the tying process, yet face challenges in accurately estimating tying poses in congested rebar nodes. In this paper, we introduce a hybrid perception and motion planning approach that integrates geometry-based perception with Equivariant Denoising Diffusion on SE(3) (Diffusion-EDFs) to enable robust multi-node rebar tying with minimal training data. Our perception module utilizes density-based clustering (DBSCAN), geometry-based node feature extraction, and principal component analysis (PCA) to segment rebar bars, identify rebar nodes, and estimate orientation vectors for sequential ranking, even in complex, unstructured environments. The motion planner, based on Diffusion-EDFs, is trained on as few as 5-10 demonstrations to generate sequential end-effector poses that optimize collision avoidance and tying efficiency. The proposed system is validated on various rebar meshes, including single-layer, multi-layer, and cluttered configurations, demonstrating high success rates in node detection and accurate sequential tying. Compared with conventional approaches that rely on large datasets or extensive manual parameter tuning, our method achieves robust, efficient, and adaptable multi-node tying while significantly reducing data requirements. This result underscores the potential of hybrid perception and diffusion-driven planning to enhance automation in on-site construction tasks, improving both safety and labor efficiency.

artificial intelligence, machine learning, point cloud, (14 more...)

arXiv.org Artificial Intelligence

2509.00065

Country:

North America > United States (0.46)
Asia > China (0.29)
Europe > Spain (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Materials > Construction Materials (0.73)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

On Reasoning Strength Planning in Large Reasoning Models

Sheng, Leheng, Zhang, An, Wu, Zijian, Zhao, Weixiang, Shen, Changshuo, Zhang, Yi, Wang, Xiang, Chua, Tat-Seng

arXiv.org Artificial IntelligenceJun-11-2025

Recent studies empirically reveal that large reasoning models (LRMs) can automatically allocate more reasoning strengths (i.e., the number of reasoning tokens) for harder problems, exhibiting difficulty-awareness for better task performance. While this automatic reasoning strength allocation phenomenon has been widely observed, its underlying mechanism remains largely unexplored. To this end, we provide explanations for this phenomenon from the perspective of model activations. We find evidence that LRMs pre-plan the reasoning strengths in their activations even before generation, with this reasoning strength causally controlled by the magnitude of a pre-allocated directional vector. Specifically, we show that the number of reasoning tokens is predictable solely based on the question activations using linear probes, indicating that LRMs estimate the required reasoning strength in advance. We then uncover that LRMs encode this reasoning strength through a pre-allocated directional vector embedded in the activations of the model, where the vector's magnitude modulates the reasoning strength. Subtracting this vector can lead to reduced reasoning token number and performance, while adding this vector can lead to increased reasoning token number and even improved performance. We further reveal that this direction vector consistently yields positive reasoning length prediction, and it modifies the logits of end-of-reasoning token to affect the reasoning length. Finally, we demonstrate two potential applications of our findings: overthinking behavior detection and enabling efficient reasoning on simple problems. Our work provides new insights into the internal mechanisms of reasoning in LRMs and offers practical tools for controlling their reasoning behaviors. Our code is available at https://github.com/AlphaLab-USTC/LRM-plans-CoT.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.0839

Country: Asia (0.46)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
(2 more...)

Add feedback

RF-Source Seeking with Obstacle Avoidance using Real-time Modified Artificial Potential Fields in Unknown Environments

Mulla, Shahid Mohammad, Kanakapudi, Aryan, Narasimhan, Lakshmi, Tiwari, Anuj

arXiv.org Artificial IntelligenceJun-10-2025

--Navigation of UA Vs in unknown environments with obstacles is essential for applications in disaster response and infrastructure monitoring. However, existing obstacle avoidance algorithms such as Artificial Potential Field (APF) are unable to generalize across environments with different obstacle configurations. Furthermore, the precise location of the final target may not be available in applications such search and rescue, in which case approaches such as RF source seeking can be used to align towards the target location. This paper proposes a real-time trajectory planning method, which involves real time adaptation of APF through a sampling-based approach. The proposed approach utilizes only the bearing angle of the target without its precise location, and adjusts the potential field parameters according to the environment with new obstacle configurations in real time. The main contributions of the article are i) RF source seeking algorithm to provide a bearing angle estimate using RF signal calculations based on antenna placement, and ii) modified APF for adaptable collision avoidance in changing environments, which are evaluated separately in the simulation software Gazebo, using ROS2 for communication. Simulation results show that the RF source-seeking algorithm achieves high accuracy, with an average angular error of just 1.48 degrees, and with this estimate, the proposed navigation algorithm improves the success rate of reaching the target by 46% and reduces the trajectory length by 1.2% compared to standard potential fields. The increasing use of drones in various applications has been facilitated by advancements in sensor technology, enabling better localization and obstacle detection methods. These technologies allow drones to effectively navigate through complex environments, avoiding obstacles in real time. The demand for autonomous drone navigation is growing in sectors like search and rescue [1], inspection of unknown areas [2], and other critical applications requiring drones to operate in unfamiliar and potentially hazardous environments. In these scenarios, drones must autonomously identify and locate targets, update environmental maps in real time, detect obstacles, and plan safe trajectories. The variability of these environments, such as changes in obstacle sizes, distances, and spatial constraints, poses a significant challenge to creating a unified navigation system that can adapt to such differing conditions.

artificial intelligence, real time system, trajectory, (18 more...)

arXiv.org Artificial Intelligence

2506.06811

Country: Europe (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Transportation (0.34)

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.85)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.34)

Add feedback

Decouple and Orthogonalize: A Data-Free Framework for LoRA Merging

Zheng, Shenghe, Wang, Hongzhi, Huang, Chenyu, Wang, Xiaohui, Chen, Tao, Fan, Jiayuan, Hu, Shuyue, Ye, Peng

arXiv.org Artificial IntelligenceMay-23-2025

With more open-source models available for diverse tasks, model merging has gained attention by combining models into one, reducing training, storage, and inference costs. Current research mainly focuses on model merging for full fine-tuning, overlooking the popular LoRA. However, our empirical analysis reveals that: a) existing merging methods designed for full fine-tuning perform poorly on LoRA; b) LoRA modules show much larger parameter magnitude variance than full fine-tuned weights; c) greater parameter magnitude variance correlates with worse merging performance. Considering that large magnitude variances cause deviations in the distribution of the merged parameters, resulting in information loss and performance degradation, we propose a Decoupled and Orthogonal merging approach(DO-Merging). By separating parameters into magnitude and direction components and merging them independently, we reduce the impact of magnitude differences on the directional alignment of the merged models, thereby preserving task information. Furthermore, we introduce a data-free, layer-wise gradient descent method with orthogonal constraints to mitigate interference during the merging of direction components. We provide theoretical guarantees for both the decoupling and orthogonal components. And we validate through extensive experiments across vision, language, and multi-modal domains that our proposed DO-Merging can achieve significantly higher performance than existing merging methods at a minimal cost. Notably, each component can be flexibly integrated with existing methods, offering near free-lunch improvements across tasks.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.15875

Country:

North America > United States > Minnesota (0.28)
Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Responsible Diffusion Models via Constraining Text Embeddings within Safe Regions

Li, Zhiwen, Chen, Die, Fan, Mingyuan, Chen, Cen, Li, Yaliang, Wang, Yanhao, Zhou, Wenmeng

arXiv.org Artificial IntelligenceMay-22-2025

The remarkable ability of diffusion models to generate high-fidelity images has led to their widespread adoption. However, concerns have also arisen regarding their potential to produce Not Safe for Work (NSFW) content and exhibit social biases, hindering their practical use in real-world applications. In response to this challenge, prior work has focused on employing security filters to identify and exclude toxic text, or alternatively, fine-tuning pre-trained diffusion models to erase sensitive concepts. Unfortunately, existing methods struggle to achieve satisfactory performance in the sense that they can have a significant impact on the normal model output while still failing to prevent the generation of harmful content in some cases. In this paper, we propose a novel self-discovery approach to identifying a semantic direction vector in the embedding space to restrict text embedding within a safe region. Our method circumvents the need for correcting individual words within the input text and steers the entire text prompt towards a safe region in the embedding space, thereby enhancing model robustness against all possibly unsafe prompts. In addition, we employ Low-Rank Adaptation (LoRA) for semantic direction vector initialization to reduce the impact on the model performance for other semantics. Furthermore, our method can also be integrated with existing methods to improve their social responsibility. Extensive experiments on benchmark datasets demonstrate that our method can effectively reduce NSFW content and mitigate social bias generated by diffusion models compared to several state-of-the-art baselines.

artificial intelligence, direction vector, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.15427

Country:

Asia > China (0.29)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.69)
Law (0.68)
Social Sector (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States

Chia, Xin Wei, Pan, Jonathan

arXiv.org Artificial IntelligenceMar-12-2025

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they remain vulnerable to adversarial manipulations such as jailbreaking via prompt injection attacks. These attacks bypass safety mechanisms to generate restricted or harmful content. In this study, we investigated the underlying latent subspaces of safe and jailbroken states by extracting hidden activations from a LLM. Inspired by attractor dynamics in neuroscience, we hypothesized that LLM activations settle into semi stable states that can be identified and perturbed to induce state transitions. Using dimensionality reduction techniques, we projected activations from safe and jailbroken responses to reveal latent subspaces in lower dimensional spaces. We then derived a perturbation vector that when applied to safe representations, shifted the model towards a jailbreak state. Our results demonstrate that this causal intervention results in statistically significant jailbreak responses in a subset of prompts. Next, we probed how these perturbations propagate through the model's layers, testing whether the induced state change remains localized or cascades throughout the network. Our findings indicate that targeted perturbations induced distinct shifts in activations and model responses. Our approach paves the way for potential proactive defenses, shifting from traditional guardrail based methods to preemptive, model agnostic techniques that neutralize adversarial states at the representation level.

activation, perturbation, representation, (16 more...)

arXiv.org Artificial Intelligence

2503.09066

Country: Asia > Singapore (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Grimm: A Plug-and-Play Perturbation Rectifier for Graph Neural Networks Defending against Poisoning Attacks

Liu, Ao, Li, Wenshan, Li, Beibei, Ma, Wengang, Li, Tao, Zhou, Pan

arXiv.org Artificial IntelligenceDec-19-2024

Recent studies have revealed the vulnerability of graph neural networks (GNNs) to adversarial poisoning attacks on node classification tasks. Current defensive methods require substituting the original GNNs with defense models, regardless of the original's type. This approach, while targeting adversarial robustness, compromises the enhancements developed in prior research to boost GNNs' practical performance. Here we introduce Grimm, the first plug-and-play defense model. With just a minimal interface requirement for extracting features from any layer of the protected GNNs, Grimm is thus enabled to seamlessly rectify perturbations. Specifically, we utilize the feature trajectories (FTs) generated by GNNs, as they evolve through epochs, to reflect the training status of the networks. We then theoretically prove that the FTs of victim nodes will inevitably exhibit discriminable anomalies. Consequently, inspired by the natural parallelism between the biological nervous and immune systems, we construct Grimm, a comprehensive artificial immune system for GNNs. Grimm not only detects abnormal FTs and rectifies adversarial edges during training but also operates efficiently in parallel, thereby mirroring the concurrent functionalities of its biological counterparts. We experimentally confirm that Grimm offers four empirically validated advantages: 1) Harmlessness, as it does not actively interfere with GNN training; 2) Parallelism, ensuring monitoring, detection, and rectification functions operate independently of the GNN training process; 3) Generalizability, demonstrating compatibility with mainstream GNNs such as GCN, GAT, and GraphSAGE; and 4) Transferability, as the detectors for abnormal FTs can be efficiently transferred across different systems for one-step rectification.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2412.08555

Country:

North America > United States (0.14)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filtration learning in exact multi-parameter persistent homology and classification of time-series data

Kim, Keunsu, Jung, Jae-Hun

arXiv.org Machine LearningJun-27-2024

To analyze the topological properties of the given discrete data, one needs to consider a continuous transform called filtration. Persistent homology serves as a tool to track changes of homology in the filtration. The outcome of the topological analysis of data varies depending on the choice of filtration, making the selection of filtration crucial. Filtration learning is an attempt to find an optimal filtration that minimizes the loss function. Exact Multi-parameter Persistent Homology (EMPH) has been recently proposed, particularly for topological time-series analysis, that utilizes the exact formula of rank invariant instead of calculating it. In this paper, we propose a framework for filtration learning of EMPH. We formulate an optimization problem and propose an algorithm for solving the problem. We then apply the proposed algorithm to several classification problems. Particularly, we derive the exact formula of the gradient of the loss function with respect to the filtration parameter, which makes it possible to directly update the filtration without using automatic differentiation, significantly enhancing the learning process.

filtration, persistence image, persistent homology, (16 more...)

arXiv.org Machine Learning

2406.19587

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre:

Workflow (0.68)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback